57 research outputs found
Improving Non-autoregressive Translation Quality with Pretrained Language Model, Embedding Distillation and Upsampling Strategy for CTC
Non-autoregressive approaches aim to improve the inference speed of
translation models, particularly those that generate output in a one-pass
forward manner. However, these approaches often suffer from a significant drop
in translation quality compared to autoregressive models. This paper introduces
a series of innovative techniques to enhance the translation quality of
Non-Autoregressive Translation (NAT) models while maintaining a substantial
acceleration in inference speed. We propose fine-tuning Pretrained Multilingual
Language Models (PMLMs) with the CTC loss to train NAT models effectively.
Furthermore, we adopt the MASK insertion scheme for up-sampling instead of
token duplication, and we present an embedding distillation method to further
enhance performance. In our experiments, our model outperforms the baseline
autoregressive model (Transformer \textit{base}) on multiple datasets,
including WMT'14 DEEN, WMT'16 ROEN, and
IWSLT'14 DEEN. Notably, our model achieves better performance
than the baseline autoregressive model on the IWSLT'14 EnDe
and WMT'16 EnRo datasets, even without using distillation data
during training. It is worth highlighting that on the IWSLT'14
DEEN dataset, our model achieves an impressive BLEU score of
39.59, setting a new state-of-the-art performance. Additionally, our model
exhibits a remarkable speed improvement of 16.35 times compared to the
autoregressive model.Comment: 12 pages, 6 figure
Second Triangular Hermite Spline Curves and Its Application
Abstract: A class of rational square trigonometric spline is presented, which shares the same properties of normal cubic Hermite interpolation spline. The given spline can more approximate the interpolated curve than the ordinary polynomial cubic spline.Key words: Hermite spline curve; C2 continuous; Faultage area; Precisio
Frustratingly Easy Model Generalization by Dummy Risk Minimization
Empirical risk minimization (ERM) is a fundamental machine learning paradigm.
However, its generalization ability is limited in various tasks. In this paper,
we devise Dummy Risk Minimization (DuRM), a frustratingly easy and general
technique to improve the generalization of ERM. DuRM is extremely simple to
implement: just enlarging the dimension of the output logits and then
optimizing using standard gradient descent. Moreover, we validate the efficacy
of DuRM on both theoretical and empirical analysis. Theoretically, we show that
DuRM derives greater variance of the gradient, which facilitates model
generalization by observing better flat local minima. Empirically, we conduct
evaluations of DuRM across different datasets, modalities, and network
architectures on diverse tasks, including conventional classification, semantic
segmentation, out-of-distribution generalization, adverserial training, and
long-tailed recognition. Results demonstrate that DuRM could consistently
improve the performance under all tasks with an almost free lunch manner.
Furthermore, we show that DuRM is compatible with existing generalization
techniques and we discuss possible limitations. We hope that DuRM could trigger
new interest in the fundamental research on risk minimization.Comment: Technical report; 22 page
Listen to Minority: Encrypted Traffic Classification for Class Imbalance with Contrastive Pre-Training
Mobile Internet has profoundly reshaped modern lifestyles in various aspects.
Encrypted Traffic Classification (ETC) naturally plays a crucial role in
managing mobile Internet, especially with the explosive growth of mobile apps
using encrypted communication. Despite some existing learning-based ETC methods
showing promising results, three-fold limitations still remain in real-world
network environments, 1) label bias caused by traffic class imbalance, 2)
traffic homogeneity caused by component sharing, and 3) training with reliance
on sufficient labeled traffic. None of the existing ETC methods can address all
these limitations. In this paper, we propose a novel Pre-trAining
Semi-Supervised ETC framework, dubbed PASS. Our key insight is to resample the
original train dataset and perform contrastive pre-training without using
individual app labels directly to avoid label bias issues caused by class
imbalance, while obtaining a robust feature representation to differentiate
overlapping homogeneous traffic by pulling positive traffic pairs closer and
pushing negative pairs away. Meanwhile, PASS designs a semi-supervised
optimization strategy based on pseudo-label iteration and dynamic loss
weighting algorithms in order to effectively utilize massive unlabeled traffic
data and alleviate manual train dataset annotation workload. PASS outperforms
state-of-the-art ETC methods and generic sampling approaches on four public
datasets with significant class imbalance and traffic homogeneity, remarkably
pushing the F1 of Cross-Platform215 with 1.31%, ISCX-17 with 9.12%.
Furthermore, we validate the generality of the contrastive pre-training and
pseudo-label iteration components of PASS, which can adaptively benefit ETC
methods with diverse feature extractors.Comment: Accepted by 2023 20th Annual IEEE International Conference on
Sensing, Communication, and Networking, 9 pages, 6 figure
- …